A consideration on time-frequency masking methods for speech separation

نویسندگان

  • Ning DING
  • Takayuki SHIMADA
  • Masahshi YOSHIDA
  • Junya ONO
  • Wlodzimierz KASPRZAK
  • Nozomu HAMADA
چکیده

Time-Frequency Masking methods, primary known as DUET [2] and SAFIA [3], are effective scheme for blind speech separation problem. Based on an investigation of conventional delay-histogram and the time-frequency masking method in terms of estimated delay accuracy, two novel approaches for clustering process are proposed. In particular, the proposed methods tend to improve relatively large amount of delay estimation error using STFT phase difference in lower frequency band. A novel clustering idea is the use of frequency vs. phase difference dot plot in a frame by frame manner. The other one is to use filter bank and fractional delay operation. These approaches are proved to be effective through several experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous time-frequency masking method for blind speech separation with adaptive choice of threshold parameter using ICA

We propose a novel method for blind speech separation using continuous time-frequency masking. The method is equipped with an adaptive choice of a threshold parameter that is based on utilization of ICA methods. We present a direct application that consists in the speech segregation for automatic transcription of spoken broadcasts disturbed by background music. Experimental results show improve...

متن کامل

Time-frequency masking for speech separation and its potential for hearing aid design.

A new approach to the separation of speech from speech-in-noise mixtures is the use of time-frequency (T-F) masking. Originated in the field of computational auditory scene analysis, T-F masking performs separation in the time-frequency domain. This article introduces the T-F masking concept and reviews T-F masking algorithms that separate target speech from either monaural or binaural mixtures...

متن کامل

Stereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment

We present noise robust automatic speech recognition (ASR) using sparseness-based underdetermined blind source separation (BSS) technique. As a representative underdetermined BSS method, we utilized time-frequency masking in this paper. Although time-frequency masking is able to separate target speech from interferences effectively, one should consider two problems. One is that masking does not...

متن کامل

Online blind speech separation using multiple acoustic speaker tracking and time-frequency masking

Separating speech signals of multiple simultaneous talkers in a reverberant enclosure is known as the cocktail party problem. In real-time applications online solutions capable of separating the signals as they are observed are required in contrast to separating the signals offline after observation. Often a talker may move, which should also be considered by the separation system. This work pr...

متن کامل

Modulation domain blind source separation for noisy speech mixture

In this paper, we propose a noise-robust blind speech separation (BSS) method by using two microphones. We first use modulation domain real and imaginary spectral subtraction (MRISS) to enhance both magnitude and phase spectra of the speech mixture inputs. We then estimate the direction of arrivals (DOAs) of the speech sources and perform time-acoustic-modulation frequency masking to recover th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009